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ABSTRACT 

Least squares fitting process as a method of data 
reduction presented. The general strategy is to consider fitting 
(linear) models as partitioning data into a fit and residuals. The 
fit can be parsimoniously represented by a summary of the data. A fit 
is considered adequate if the residuals are small enough so that 
manipulating their signs and locations does not affect the summary 
more than a pre-specif ied amount. The effect of the residuals on the 
summary is shown to be (approximately) characterized by the output of 
standard regression programs. The general process of linear fitting 
models by least squares is covered in detail and discussed briefly in 
its relationship to standard hypothesis testing and to Fisher* s 
randomization test. Fitting in weighted least squares and a 
comparison of fitting to standard methods are also discussed. It is 
shown that some of the output (e.g., standard errors, t, F, and p 
statistics) from standard regression programs can be interpreted as 
approximate measures of goodn3Sis**of-f it of a model to the observed 
data. The interpretation is also applic&ble in weighted least squares 
situations such as robust regression. (PN) 
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ABSTRACT 



Least squares fitting is perhaps the most commonly used tool 
of statisticians. Under sampling assumptions, statistical infer- 
ence makes possible the estimation of population parameters and 
their confidence intervals and also the testing of hypotheses. In 
this paper the properties of least squares fitting is examined 
without sampling assumptions. It is shown that some of the output 
(e.g. standard errors, t , F , and p statistics) from stand- 
ard regression programs can be interpreted as (approximate) 
measures of goodness-of-f it of a model to the observed data. The 
interpretation is also applicable in weighted least squares situa- 
tions such as robust regression. 
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1. INTRODUCTION 

Apparently, fitting simple functions to observed data is a 
very basic human function. Art students are taught that an 
observer will see familiar shapes such as lines, squares, or 
circles even when only a few points or segments are actually 
present. Science students are taught that finding r^jnple functions 
that represent complex events is one of the fundamental procedures 
of science. Given the breadth of application, it is no wonder that 
fitting functional forms to data has such an important position in 
statistical theory. 

Statistical theory enhances fitting procedures in many ways. 
Given assumptions about a population and sampling procedures, we 
may infer that the fit from a random sample is an unbiassed 
estimator of a population parameter, and we can estimate the sample 
to-sample variance of the parameter estimate. Under correct condi- 
tions, we may develop a confidence interval for a parameter or test 
an hypothesis that the parameter is some known value. Clearly, 
statistical theory adds substantially to the interpretation of 
fitting — at the cost of collecting (or assuming) a random sample 
and making assumptions about the population distribution of 
residuals. Statistical theory also dictates, to some degree, what 
functions of the data (e.g. standard error of a mean) that we 
interpret . 
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However, the estimation of population parameters is but one of 
the important functions of statistics. R. A. Fisher (1930) wrote 
on the first page of Statistical Methods of Research Workers , 
"Statistics may be regarded as (i) the study of populations, (ii) 
as the study of variations , (iii) as the study of the methods of 
the reduction of data." He also wrote of (page 5) "the practical 
need to reduce the bulk of any given body of data," and later 
(page 6) "We want to be able to express all the relevant information 
contained in the mass by means of comparatively few numerical 
values." Although random sampling may be important in estimating 
population parameters, there is no reason to forego data reduction 
in nonrandom samples as long as one is careful not to make the 
inferences that only random sampling allows. 

The purpose of this paper is to discuss the fitting process as 
a method of data redaction. No assumptions about the sampling 
process nor population distribution will be made. Clearly, when- 
ever the usual statistical assumptions are plausible, standard 
procedures of statistical inference should be us^, but the concern 
here is with data for which the assumptions are quite inappropriate. 
The general strategy is to consider fitting (linear) models as 
partitioning data into a fit and residuals. The fit cap be parsi- 
moniously represented by a summary of the data. A fit is considered 
adequate if the residuals are small enough so that manipulating 
their signs and locations does not affect the summary more than a 
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pre-specif ied amount. The effect of the residuals on the summary 
is shown to be (approximately) characterized by the output of 
standard regression programs. Decision rules for accepting a fit 
will be proposed, and these rules will be shown to be equivalent 
in large samples to standard hypothesis tests. 

That some of the statistics from a regression analysis can be 
interpreted as descriptive statistics is known (e.g. regression 
coefficients, squared multiple correlation), but this paper also 
shows possible interpretations of the covariance of regression 
coefficients, t , F ; and "probability'* statistics. Not much 
work seems to have been done in this area, but a paper by Freedman 
and Lane (1978) does approach the problem of nonstochastic signifi- 
cance testing and, although they differ in purpose and approach, 
come to similar conclusions where their work overlaps with what is 
covered here, and the bootstrap of Efron (1979) is in a similar 
spirit . 

The next section in this paper will cover in detail the 
general process of linear fitting models by least squares and 
discuss briefly its relationship to standard hypothesis testing 
and to Fisher's randomization test.. The following section will 
discuss fitting in weighted least squares and compare fitting to 
standard methods. Most proofs are relegated to the appendix. 



-4- 



2. CASE I: ORDINARY LEAST SQUARES 
Let us assume that we have a set of data and wish to fit a 
linear model using the least squares criterion. The data require- 
ments and the notation used in this paper are shown in Table 1. 
All data elements are known, fixed, finite, real numbers. The 
matrix W is a diagonal matrix of weights which will be discussed 
later and can be assumed equal to an identity matrix here. Least 
squares fitting is a matter cf algebra and, as long as X^X has an 
inverse, an unique equation can be fitted. The computation of the 
least squares coefficients can be performed using a standard 
regression program. 



Insert Table 1 about here 



The output of most regression programs consists not only of 
the regression coefficients but also of a number of other statistics 
associated with regression analysis. Table 2 contains a fairly 
extensive list of regression statistics which might be included in 
computer output and a formula for each. That computer programs 
differ in internal algorithm or precision does rot: concern us ^ere; 
we will assume that enough precision is kept so that rounding* e^ror 
can be ignored. The derivation and interpretation of thesie* statis- 
tics using sampling theory are too well known to repeat here (see, 
for example, Graybill 1961, Draper and Smith 1966, Daniel and Wood 
1971, Searle 1971, etc.) 

10 
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Insert Table 2 about here 



The interpretation of some of these statistics without sam- 
pling theory is also well known. Fishier (193Q) showed that a 
regression coefficient may be considered as a weighted average of 
the response variable y where the weights are a function of a 
regressor variable. Tukey's catchers (Beaton and Tukey 1974) may 
be thought of as sets of weights to be applied to the response 
variable in order to form partial regression coefficients. Thus, 
any regression coefficient may be conceived of as a weighted average 
of the response variable. The standard error of estimate is a 
measure of how well the linear model fits the data, although divi- 
sion by the number of degrees of freedom comes from sampling 
considerations whereas division by the sample size would seem more 
appropriate for descriptive purposes. The squared multiple correla- 
tion may be interpreted as a relative measure of goodness of fit. 
However, some common regression outputs, the cov(b) , t_. , 
p(tj) , F , and p(F) , are not usually interpreted as descrip- 
tive statistics but from consideration of the inferred behavior of 
different random samples. We will show below that these statistics 
also permit interpretation as measures of goodness of fit without 
recourse to sampling theory. 

The premise of this paper is that data reduction is a suffi-- 
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cient reason for data analysis. We wish to reduce a mass of data 
Into a few numbers which characterize or summarize an Interesting 
relationship between the response vector y and the regressor 
matrix X . We often do this by fitting a linear (In the param- 
eters) model by least squares. Since the regression coefficients 
are weighted averages of the response values, they may be considered 
as summary statistics* A summary will in most cases lose some of 
the information in the original data in the sense that the response 
variable cannot be exactly reconstructed from X and the summary, 
b . To judge the adequacy of the summary, we will develop measures 
of how well the original data can be reconstructed from the summary 
and how sensitive the summary is to the information lost in the data 
reduction. 

Fitting equations to data may be considered as a way of parti- 
tlonlng a set of observations., y into two parts, a fit, y , and 
residuals, . e ; that is, 

y = y + e 

- - - (2.1) 

(data) (fit) (residuals) 

/\ 

' where the values of y are related to the regressors X by the 
linear function y = Xb and the residuals ^ ~ V ~ Z what is 

left over. A data summary b is considered good if y ^ y > that 
is, the actual values y are satisfactorily reconstructed from the 
fitted values which implies that e = 0 and thus may be ignored. 

ERIC 
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The residuals are, of course, minimal in the least squares sense, 
but minimal does not necessarily mean small. 

We will here consider the vector e to be small if its 
elements e^ are so close to zero that we can be indifferent to 

1 ' 

1, changes in the signs of the e^ , and 

2. rearrangen^ents of the locations of the e^ 

that is, we will consider the vector e to be small if we can 
rearrange its elements and change some or all of their signs to 
form a vector e^ , say, then create a pseudo-data vector 
y^ = y -I- e^ and still have the vector y = y^ • We will judge 
the closeness of y^. to y by summarizing y^ in the same way 
that y was summarized, that is, regress y^ on X , and see 
whether the summary b^ , say, of y^ is reasonably close to b 
If ia large proportion of all possible b^ are close to b in the 
sense discussed below, then we will consider the fit to be adequate. 

There is a large number of ways in which the signs and locations 

N 

of the e^ can be altered; there are 2 different possible 
arrangements of the signs and Nl different ways to permute the 
elements of e - , thus there are 

K = 2^n: 

(not necessarily distinct) possible signed permutations of e 
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Let us denote each possible signed permutation of e as e^ where 
k = 1,2,...,K . The order in which the signed permutations are 
arranged is not important here, but for convenience we will denote 
e^ = e , the vector with elements in the original order and with no 
sign changes. 

I"; is convenient to write the e^ as the product of a signed 
permutation matrix and the vector e , i.e. 

!k = \t (2.2) 

where P, is of order NxN , has one nonzero element in each row 
and column, and that nonzero element is +1 or -1. The location 
of the nonzero elements determine the permutation and the il 
determines sign changes. Since e^ = e , P^ is an NxN identity 
matrix. 

In this notation, the pseudo-data vector 

yk = .y + PkS (2.3) 

and the summary computed from y^ is 

= Cy^ = b -f C^V^e . (2.4) 

The judgment of goodness of fit will be based on the differences 

between the b, and b 
-k 

The notation for ordinary least squares is summarized in Table 

3. 



.14 



-9- 



Insert Table 3 about here 



To illustrate the signed permutation scheme, a numerical 

demonstration is shown in Figures A and B, Figure A shows a simple 

data set with N = 5 and m = 1 ; thus, the regression line has 

two coefficients, an intercept and a slope. The coefficients of the 

best fit line, b^ = (2,3) , as well as the fit y , the residuals 

2 

e , the catchers C , and the variance of the residuals O , are 
shown. If these residuals are small, then we should be able to 
resign and rearrange the elements of e and then construct data sets 

, and the summaries of these data seti^sy , b^ , should be reason- 
ably close to b . Figure B shows four such signed permutations. 
The vector is the same as e except that the sign of the first 

element is changed; the pseudo-data vector 72 ~ y ^2- ^^^^ 
shown, as are the regression coefficients h^^ ~ (-9.2, 5.8) which 
result from the regression of y^ on X . Comparing b2 to b 
shows that changing the sign of just one element of e results in 
a regression line in which the intercept has a different sign, e^ 
contains the same elements as e except that the first two elements 
are exchanged; the sigh of the intercept in the resulting regression 
coefficients b^ is different from b . In the two other examples, 
P-5 was chosen so as to identify the signed permutation with the 
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largest value of the intercept and was chosen to maximize the 

slope. ^ 

Insert Figures A and B about here 



These are but four values of e^^ ; Figures CI and C2 show 
the distribution of all K = 3840 different possible values of the 
slope and the intercept. The values of the intercept vary from -10 
to +14 and about 38 percent differ in sign from the intercept 
computed from the unmodified data; the values of the slope range 
from -.6 to +6.6 with but about two percent differing in sign from 
the original slope. The residuals are large enough so that their 
signed permutation often results in intercepts which do not even 
have the same sign as the original, although the residuals are not 
so large as to affect the sign of the slopes in a vast majority of 
cases. 



Insert Figures CI and C2 about here 



If we consider the coefficients of a fit adequate when the 
residuals are small enough such that the signed permutations of 
the residuals will seldom, if ever, result in regression coeffi- 
cients with a different sign from the original, then the following 
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one-talled decision rule for a single coefficient seems appropriate: 

Decision Rule 1: The single regression coefficient 

will be considered adequate If 100(1 - a)% of the 

coefficients b, have the same sign as b , and 
J J 

Inadequate otherwise, where a is a constant selected 
by the fitter. 

Values such as ,05 and .01 seem appropriate for a . If we are 
concerned with large deviations b^_j - b^ in either direction, we 
may use a two-tailed decision rule: 

Decision Rule 2: The fit of a regression coefficient b^ 
will be considered adequate if 100(1 - a)% of the 
coefficients b, . are no farther away in either dlrec- 
tion from b. than the point where the b, . have a 
different sign from b_. , and inadequate otherwise, 
where a is a constant selected by the fitter. 
More stringent fitting rules are possible; for example, we might 
require that the residuals be so small that their signed permuta- 
tion does not affect the first (second, etc.) significant digit 
of a coefficient. 

In addition to assessing the adequacy of fit for each individual 
regression coefficient, we may wish to judge the adequacy of fit of 
the vector b taken as a whole. We might ask: in what proportion 
of the cases did the signed permutation scheme result in vectors 

b in which all of the elements had different signs from b ? 
-k 
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For the numerical example, the joint distribution of the slopes and 
intercepts is shown in Figure C3; the absence of any points in the 
third quadrant indicates that in no case did any b^ differ from 
b in all signs. However, a more interesting question might be: 
what proportion of cases are as far away in any direction as the 
origin, that is, the point beyond which all elements of have 
different signs. To answer this question, we need a definition of 
distance. Inspection of Figure C3 shows that the values of the 
slopes and intercepts are not independent; in fact they are corre- 
lated at about -.90. The choice of a particular e^ affects the 
elements of b^ in a complementary manner; it seems natural, there- 
fore, to measure (squared) distance in a Mahalanobis-like manner, 
that is, the squared distance of any vector b, from b is 




(2.5) 



and, in the same metric, the squared distance of b from the point 
beyond which all elements have different sign is 




(2.6) 



The distribution of all possible values of d^^ is shown in Figure 
C4. 



Insert Figures C3 and C4 about here 
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Using this definition^ we may form the following decision rule 
Decision Rule 3: The fit of the regression coefficients b 
will be considered adequate if 100(1 - a)% of the 
vectors b^ are no farther away from b than the point 
where all elements of b^ have different signs from 
those of b , and inadequate otherwise, where a is a 
constant selected by the fitter. 
It is unusual in statistical applications to be concerned about the 
fit of all coefficients; in practice, the intercept is often 
arbitrary and not of interest. Decision Rule 3 can be modified so 
that it refers to any subset of the regression coefficients. 

Using these decision rules directly implies calculating all 
K possible values of b^ and this is clearly impossible except in 
very small samples; in fact, for a modestly sized sample in which 
N == 30 , the number of possible sets of regression coefficients 
is K = 2.8 X 10^^ and thus cannot be calculated by simple direct 
measures. Conceivably, Monte Carlo techniques could be used to 
estimate the proportion of b^ in any particular range, but such 
would entail a large investment in computer time and programming. 
We can, however, approximate the decision statistics for reasonably 
large samples. 

Let us first examine the distribution of the b^ . We can 
calculate the mean and covariance of b, since 
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ave^bj^j = b (2.7) 

cov(bj^) = a^(X'x)"^ (2.8) 
and the skewness and kurtosis of the jth element of b^^ are 

skew^bj^.^ = 0 (2.9) 

where B and 3, are the measures of the kurtosis of e and 
the. jth column of the catcher matrix respectively. (See Table 9 

♦ 

for summary of definitions.) Note that these calculations are 
exact, not estimates or approximations. The skewness is exactly 
that of a normal distribution (as are all odd moments), and it is 
easily seen that, given fixed values of the kurtosis of the 
residuals and catcher vectors, that 



liiTi kurt 

N -> 00 



2 ^ 

-1 



" ^ - (2.11) 

that is, as the sample size grows large, the kurtosis of each b^^ 
approaches the kurtosis of the normal distribution. Thus, the 
distribution of the b^^^ has, in the limit, the same character- 
istics as the normal curve with mean b and variance 
a^(X"X)^^ where (X"X)^^ is the jth diagonal element of (X'X) 

If we accept the normal distribution as close enough to 
the distribution of b^^^ for our purposes, then we can use a 
table of the normal curve to approximate the proportion of b^^ 
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within any particular range. Let us assume that we wish to approxi- 
mate the proportion of that differ in sign from which 
was computed in the orginal solution. Since we know the mean and 
variance of the b^^ exactly, we can form the standard normal 
deviate 

... 



which can be referred to a table of the normal curve to find the 
approximate proportion ^{^2) ' ^(^j) "^^ one-tailed, then it 

is approximately the proportion of b with different sign than 

KJ 

bj . If P^^j^ two-tailed, then it is approximately the 

proportion of b^^ as far away from b^ as the point beyond which 
the b, . differ in sign from b. . The values of p/z.\ , 
therefore, can be used as approximations for the values needed for 
Decision Rules 1 and 2. 

We can also develop an approximation for the proportion of 
b^ as far away from b as the point where all the elements of 
b^ differ in sign from b . The mean and variance of the squared 
distances of the from b are 



ave 



(dj^ ) = m + 1 (2.12) 

"*■ ^±^ii V N - 1, N - Ij 



2\ = ( \~ (m -f 1)2 + 2(4-:^] + 1) (2.13) 

2^e(N-f2) \ 
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The proofs are in the appendix. Both the mean and variance are 

exact, not estimates or approximations. The mean is exactly that 
2 

of the X distribution with m -f 1. d.f . Given a fixed kurtosis 

of the residuals, 3^ , then 

' 2e ' 

^lim var^d^^J -> 2(m -f 1) (2.14) 

2 

since ^^Q^^ ^ sample size increases, thus the variance 

2 2 
of the d^ approaches the variance of the X distribution- vith 

m -f 1 d.f . If we accept the chi-squared distribution as close 

2 

enough to the distribution of the d^ for our purposes, then the 

2 

squared distance d of the point where all elements of b^ have 
different sign from b can be referred to a table of the chi- 
squared distribution for an approximation of the percent of b^ 
as far away as the origin or, alternatively, the statistic 
,2 b"fcov(b, b / . /v-vN"l - 

r* = d ^ \ : = y x(X X) X y 

^ 2 ' m + 1 ^2. ■ ,v ^2,15) 

ave d^ a (m -f 1) 

may be computed and referred to an F table with m -f 1 and 
°^ d.f . Let us call this proportion p(F*) . p(F*) is thus an 
approximation of the value for Decision Rule 3. 

As mentioned above, in many statistical applications it will be 
of interest to measure the goodness-of-f it of some subset of b 
instead of the entire vector; for instance, a researcher may be 
interested in the goodness-of-f it of the slopes in a multiple regres- 
sion but not in the fit of the intercept. Let us call the subset of 
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interest which is of length ^tn^ £ tn + ij . Let us call 

^ks equivalent subset of b^ , the kth signed permutation 

summary. The question of interest is: wb^t proportion of the b^^ 
are as far away from b^ as the point where all signs in b^^ are 
different from b^ ? The squared distance of the vectors b^^ 



from b is 

~s 



C = (5ks - 5s)^ (--(^ks))"'(5ks - 5s) (2.16) 



and the distance of the point where all b^^ elements change sign, 



the origin, from b^ is 



Corollary 1 in the appendix shows that the mean and variance of 



, 2 

d- are 
ks 



ave(dj^^)=m^ (2.18) 
^^n'^ks ) = "'s n N - 1 / '"s 



2/32e(N+l) 



i^ii \ N - 1 



2 

The mean is the same as the X distribution with m^ d.f. and 



the variance in its limit 



lim var^dj^^^j ^ (2.20) 



N -> 
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approaches the variance of the chi-squared distribution, thus, if 

we accept the chi-squared distribution for approximation purposes, 

2 

we may look up d in a chi-square table or, alternately. 



compute 



d 2 b (cov(d, 2))-! b 

_s^_ _ ^s\ \ ks // ^s (2.21) 



aveld, 1 ^s 



which may be referred to the F table with m^ and «^ d.f . The 
resultant value, p(F*) is approximately the proportion of b^^ 
as far or farther away from b^ as the origin. 

If the researcher is interested in how close the values of the 
fits y^ are to the original data points y , we can show that 
the 



cov 



(2k) ^ y (2.22) 
(2k) " CT^X(y/X)"^ (2.23) 
Again the normal distribution, the proportion of y^^ in any 
interval can be approximated. 

Table 4 summarizes the approximation scheme for the numerical 
example. It should be remembered that the sample size is very 
small, five, and that the residuals e^ are not close to normally 
distributed; the summary here is to show numerical calculations, 
not the adequacy of the approximations. This table first shows the 
mean, variance, skewness, and kurtosis for the dl^?tribution of 
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b^Q , the intercept, and b^^ , the slope, and the mean and 

2 

variance of the d^ , the squared distances. The statistics for 
the approximation of the proportion of b, with different signs 
from the b^ are 



Zq 2/ 1^ 29.48 = .368 

= 3/ 2.68 = 1.833 

which result in the (one-tailed) proportions P(^o) ~ '^^ 

p^z^^ = .03 which are reasonably close to the actual proportions 

p^bgj and ^{^ij ^^^^^ were computed by counting. The value of 

F* 

F* = 25.9328/2 = 12.966 

should be referred to an F table with two and «^ d.f . from 
which the value p(F*) is found to be close to zero. The actual 
proportion is exactly zero. Thus, according to the decision rules, 
we would consider the intercept inadequate and consider the slope 
adequate or inadequate depending on whether we used Decision Rule 
1 or 2. The equation as a whole would be considered to fit 
adequately.*^ 



Insert Table 4 about here 
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2.1 Comment 

It is interesting to note the similarities and differences 
between classical hypothesis testing and the signed permutation 
approach to goodness of fit. Classical hypothesi-^ testing usually 
assumes that there is a model in some population of the form 



the sample at hand was randomly selected from that population; in 
return for these assumptions, probabilistic statements about the 
unknown values of 3 can be made. The signed permutation approach 
makes no assumptions about the world outside of the sample and 
thereby forfeits the right to make statements outside of the sample 
Itself. Given these major differences, it is interesting to note 
that these two approaches lead to very nearly the same calculations. 
Under its assumptions, the usual hypothesis 3^ =' 0 (where 3^ 
includes all 3^ except the intercept) leads to the statistics 
shown in Table 2 and the only differences between these and signed 
permutation statistics are 




tn 



- 1 



2 



(2.24) 



N 



s 



e 




(2.25) 



F* = 



N 



F 



(2.26) 



N - m 



- 1 
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and referring to a Student's t table and F to an F table 

with N-m-1 d.f. in the denominator. For large N and moderate 

m , the factor N/ (N-m-1) is trivial as is the difference between 

the t and normal curve tables and the difference in degrees of 

freedom in the denominator for the F table. Thus, the two 

approaches lead to the same probability statistics. 

This leads to the interesting fact that the probability of 

finding a value as large (small) or larger (smaller) than the 

sample value b^ if the value of 3^ = 0 in the population is the 

same as the proportion of signed permutations with different 

signs than b^ . Also, the probability of finding a vector 3^ 

as far away as b^ by chance if the equivalent subset of 3^ in 

the population was zero is the same in the limit as the number of 

b as far away from b as the point where all elements of bj^^ 
Ics s 

have different signs from b^ . 

That the two approaches arrive at similar places should not 

be surprising. The usual sampling assumptions that E(e) = 0 and 

E(ee^) = I are analogous to lemmas 5a and 5b which are 

that ^"•'■Ej^a^ = 0 and K'-'-E^^aj^a'^^ = a^l since the average 

residual is zero in ordinary least squares. With the substitution 

of the expectations into the theorems in the appendix, the same 

theorems would almost suffice for hypothesis testing. The major 

2 

difference would be the E(s ) which would result in the need for 

e 

a correction for degrees of freedom and this correction would be 
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carried to the cov(b) and statistics derived from it. 

From the computational point of view, the statistics from a 
standard regression program are close enough to the statistics for 
signed permutations that they can be used directly in interpreting 
goodness of fit if the sample size is much larger than the number 
of variables. 
2.2 Randomization Tests 

St- 

The Fisher (1926)-Pitman (1937) method of randomization has 
great appeal because it is derived solely from the mechanics of 
randomization without any assumptions about the parent population. 
Basically, Fisher permutes residuals about a null model in which 
all parameters are specified and, since the parameters are often 
specified to be zero, the residuals may be the original data values 
y . Since sampling is involved, there is a major difference in 
approach between a randomization test and measuring goodness of fit 
by signed permutations. 

It is possible to view the signed permutation scheme as a 
variation of a randomization test. First, the population from 
which the sample was selected would have to be assumed to be 
symmetric to justify assuming that -e^ was as likely to occur as 
e^ . Secondly, the residuals about the null model would be 
resigned and permuted instead of the residuals about the completely 
fitted model as done when measuring goodness of fit. All of the 
parameters would have to be specified since if any parameters were 
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fit from the data the residuals would be overfltted. With these 
adaptations, a distribution of all possible results of an exper- 
iment could be generated (for small N ) and the probability of 
any particular result calculated. The moments of the distribution 
of results are easily calculated using the lemmas in the appendix. 

3. WEIGHTED LEAST SQUARES 
Weighted least squares is another common statistical tech- 
nique which has been used for a number of different purposes. A 
few uses are: 

1. Equalizing variance. In samples where the variances of 
residuals are known a priori to be different at different parts of 
the regression surface, the variances can be equalized using 
weights which are proportional to the inverse of the square root 

of the variance. In this case, the sampling assumptions and infer- 
ences are well known. 

2. Differential precision. In some fittings, the researcher 
may be interested in fitting some segment of the regression plane 
more carefully than other segments and does so by weighting the 
residuals in that segment more heavily than the rest. This is 
similar to the situation in survey research where various strata 

\re sampled with different probabilities to assure a fixed repre- 
sentation of all strata, and the inverse of the sampling probabil- 
ity is used as a weight when combining strata. 
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3. Robust/resistant regression. Robust/resistant regression 
uses the sample residuals to form weights and, by discounting 
large residuals, reduces the effect of outliers on the fit (see 
Hosteller and Tukey 1977; Beaton and Tukey 1974). Robust/resistant 
regression may modify weights iteratively. 

The implications from sampling theory are quite different in 
the above cases, but the basic fitting procedure is the same. We 
will again use the definitions of data in Table 1. Note that the 
weights W are considered fixed as well as the data X and y • 
Where the weights came from is not important here; it is immaterial 
whether W came from sampling considerations or from iteratively 
. reweighting residuals. All that is required is that X'WX has an 
inverse. 

The basic algebra of the fitting process is clear. If we 
wish to fit a linear model of the form 

y = Xb + e (3.1) 

subject to the condition that e^We be a minimum, then the value 
of b which minimizes e'We is 

b = (X'WX)""Vwy . (3.2) 

Thus the value of the fit is 

y = Xb (3.3) 
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and the residuals are 

e=y-y=y-Xb (3.4) 

. Weighted least squares, then may also be considered as partitioning 
y Into two parts 

<*\ 

y =• y + e (3.5) 
(data) (fit) (residuals) 

If the fit y and the summary b are to be accepted as adequate, 
then we should assure ourselves that the residuals are not large 
enough to affect seriously either. As with ordinary least squares, 
we may develop a metric of goodness of fit by examining the effect 
of signed permutations of the residuals. 

However, there are at least two reasonable ways In which the 
residuals may be resigned and permuted: 

Case II: Weight the resigned and permuted residuals, and 

Case III: Resign and permute the weighted residuals • 
Both of these cases use the same data summary, b , which was 
computed so as to minimize the objective function e^We , but 
differ In the way that the pseudo-data sets are constructed. 

In Case II, the resigning and permuting Is done without any 
reference to the weights. The weights are considered to be 
associated with the rows of X , thus, whatever residual 

Is attached to the y. computed from x.b will be weighted by 
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the weight associated with the ith observation. This approach 
seems reasonable when the weights are used for differential preci- 
sion since whatever residuals associated with a sensitive, area 
will be so weighted. 

In Case III, the weights are associated with the residuals, 
so that the weighting Is done before the resigning and permuting. 
The weighted residuals are therefore added to value of x^b to 
compute the pseudo-data sets. This approach seems more appropri- 
ate when the weights are chosen to operate on the residuals as 
when chosen to equalize the variance of residuals or In robust/ 
resistant regression. 

3.1 Case II: Weighting the Resigned and Permuted Residuals 
The basic definitions for Case II are shown In Table 5. 
Although these definitions are quite similar to the definitions 
for ordinary least squares (Case I), there are some Important 
differences which should be noted. The weights are Included In 
the catchers, I.e. C = WX(X"WX)"-^ . The mean of the residuals 
Is not In general equal to zero, although the weighted mean, that 
Is, the mean of We , Is. The value \X2e equal the 

variance of the residuals but Is simply the unweighted, uncentered 
mean square of the residuals. 



Insert Table 5 about here 
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The pseudo-values of the vector y are formed by 




* y + Pj^e 



(3.6) 



that is, a signed permutation of the residuals is added to the fit 
as in Case I, and the data are summarized by 



The question is still whether or not the elements of b^ often 
differ in sign from the corresponding elements of b 

Figure D contains the results of a numerical demonstration of 
Case II. The weights were arbitrarily chosen. The weighted 
regression coefficients b and the other basic statistics of Case 
II are shown in the top of the figure. Note that the intercept 
using these weights is negative whereas the unweighted intercept 
is positive. Four signed permutations are shown. ^2 ^3 
are the same signed permutations as used in Case. I, but only one 
results in a difference in sign from the original summary. 
and were chosen to display the largest intercept and slope 

respective^ly. 



Since the number of signed permutations is the same in weighted 
regression, we cannot reasonably compute all b, and count the 



(3.7) 



Insert Figure D about here 
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number of h^^ vlth different signs from the nor can we 

compute the number of as far away as the point where all 

elements have different signs, that is 



d 



2 = b^(cov(bj^))"^ b (3.8) 



However, we do know something about the distribution of these 
statistics. We know 



ave 



cov 



(\) • I (3.9) 

(••k) " Vze'"*"''"'''' xVx(X'WX)'^ X'« (3.10) 

ave(dj^^) = m + 1 (3.13) 

2/ 32e<^-2) 3N \ 
^i'lii \ N - 1 N - 1 / 

(?J = y (3.15) 

cov(9j^) - X<X'WX)"^ xVx(X'WX)'^ (3.16) 

The proof is shown in Theorem 2 in the appendix. 
It is easily shown that 



var 



ave 



3<i 
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lim kurt^bj^jj-^ 3 (3.17) 

and thus .as the sample grows large, given fixed i^^^ and ^2c^ * 

the moments of the distribution oi b^^ approach the moments of 

2 

the normal distribution. Also, the variance of 

var^dj^^^ > 2(m + 1) (3.18) 

2 

approaches the variance of the X distribution with 2(m + 1) 

2 

d.f. If we accept the normal and X distributions as close 
enough for our purposes, then we can compute 

z /a (3.19) 

^ jj 

and 

F* = d^/(m + 1) (3.20) 



which can be referred to normal curve or F tables for approxi- 
mate proportions to be used in the decision rules. The numerical 
reauLLs for Case II are shown in Table 6. 



Insert Table 6 about here 



3.2 Comment 

Case II is analogous to the situation in sampling in which 
one makes the usual assumptions of ordinary least squares but 
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weights the data anyway. That is, one assumes that y = X3 + e 

where B is the population parameters, and that E(e) = 0 and 

ECee"") " a I where O is the variance of e in the popula- 
p p 

tion. Under these assumptions, 

E(^2e) = %^ (3-21) 

A. 

The value b is ab unbiassed estimator 

E(b) = 6 (3.22) 

and the sample to sample variance of b is 

cov(b) = ap^WXCX'WX)"-'- X'W^XCX'WX)"-'' X"W (3.23) 

Thus the difference between the signed permutation approach and 
weighted regression using these assumptions results in using 
degrees of freedom as a divisor instead of the sample size N 

It is worth noting that Case II results in more complex and 
unusual calculations than Case I and, as we shall see. Case III. 
To compute the cov^b^^j in one pass over the data requires some 
unweighted sununations, summations usii^g the weights, and summations 
using the squares of the weights. We know of no computer programs 
that come close to these calculations. Weighting for differential 
precision will, therefore, come at considerable cost . 

3.3 Case III: Resigning and Permuting the Weighted Residuals 
To examine the effect of the residuals in Case III, it is 
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convenient to transform the data in such a way that the data can be 

handled as if they were unweighted. This can be done by defining 

a diagonal matrix V in which the diagonal elements are the square 

2 

roots of the equivalent elements in ' W , thus, V = V^V = W 
Using an asterisk to identify transformed data, Wxi can form 
y* = Vy and X* = VX . The catchers are C = X*(X*^X*)'"''' = 
VX(X^WX)"^ and the regression coefficients are 

b = c"y* = b (3.24) 

which is to say that the regression coefficients are not changed 
by the transformation. However, the fit 

9* = X*b = Vy (3-25) 

and the residuals 

e* = y* - y* = Ve (3.26) 

are. Note that the residuals e^* do not sum to zero and that 
u = e*^e* = e^We which is the objective function which was 
minimized. The notation for Case III is summarized in both 
weighted and unweighted form in Table 7. 



Insert Table 7 about here 
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Using the weighted data, the pseudo-values of y can be 
defined as 

y^* = y* + Pj^e* r (3.27) 

and the regression coefficients computed using these values are 

bj^ « C'yj^* = b + C'Pj^e* (3.28) 

Figure E contains a numerical demonstration of this signed permu- 
tation scheme. The weights are the same as in Case II; thus the 
regression coefficients are the same. The values of y* , X* , 
and C are shown. The first two signed permutation matrices, 
P2 and Pj > are the same as used in the previous cases and the 
last two, P^ and P5 , were selected to result in the maximum 
intercept and slope respectively under this signed permutation 
scheme. The demonstration shows that the distribution of bj^ is 
quite different in Case III than Case II. 



Insert Figure E about here 



As in the previous cases, we cannot compute all K values 
of b, except in very small samples and thus we cannot calculate 
the proportions needed for using the decision rules. However, we 
again know the moments of the distribution exactly if we compute 
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ave 



(3.29) 



skew^bj^j^ =0 (3.31) 
ave(dj^2^ =m+l (3.33) 

i'^ii ^ N-1 "n-1/ 



+ E 



ava(yj^) = y 



(3.35) 



""'(^k) = X(X'WX)"^ X' (3.36) 

Proof is shown in Theorem 3. Since the kurt^bj^^^ and the 
var^dj^^j approach the same limits as before, we can compute 

and 

F* - d^/(m + 1) (3.38) 

which can be referred to normal curve or F tables for the 
approximate proportions to be used in the decision rules. 

The numerical values for this signed permutation scheme are 
summarized in Table 8. 
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Insert Table 8 about here 



3.4 Comment 

Case III is analogous to the sampling situation in which one 

assumes that y = X3 + £ and that the E(e) = 0 and that 

2 1 1 
E(ee^) = a W" where W is the inverse of the fixed matrix 
p * 

W . Under these assumptions, 

2 ^ -1 

The proofs that E(b) « 3 and that the var(b) = (X^WX) are 
in Draper and Smith (1966). 

It is worth noting that the computations necessary to use 
Case III are simple. The necessary summaries are y^Wy , y^WX , 
and X^WX from which decision statistics can be computed. In Case 
III, as in Case II, the decision statistics are not affected by 
multiplying W by any positive constant. 



Insert Table 9 about here 
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4. CONCLUSION 

The signed permutations of residuals leads to an interpreta- 
tion of least squares calculations without sampling assumptions but 
at the cost of the inability to generalize formally to a population 
Since in many applications the assumption of a random sample from 
a known distribution may be unwarranted or an attempt at an actual 
random sample may be thwarted by practical concerns, the interpre- 
tation of regression statistics as measures of fit in the obtained 
data may be the best that a researcher can do. Such an interpre- 
tation is not trivial, however, since substantial confidence in a 
model may come from fitting the model to many data sets under many 
conditions and finding that the resultant coefficients are reason- 
ably similar,- 

Signed permutations may be most useful in robust/resistant 
regression where the iterative reweighting clearly violates the 
assumption of known, fixed weights and thus the sampling theory of 
weighted regression. At least the "probability statistics" gener- 
ated from iterative reweighting have an interpretation as measures 
of goodness-of-f it . These goodness-of-f it measures may, therefore, 
be used as a criterion for the effectiveness of weighting systems 
or for comparing different systems now in use. 
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Footnotes 



The author is indebted to Paul Holland for suggesting the sign 
changes. 

2 

The algorithm for finding the maximum values is due to Lustig 
(1979). To find the maximum hy,^ , the column c^ and the residuals 
are each rearranged into descending order of absolute magnitude and 
the signs of the residuals are changed to be the same as the corre- 
sponding elements of c^ • 

^ The analogous test in sampling theory is seldom used, that is, 

we seldom test the hypothesic* that all parcnueters including the 

intercept are simultaneously equal to zero. The test of the subset 

2 

containing only b^^ would result in an F* = 

^ The sum of squared residuals is e^e = e^(I - X(X^X) ''"X^e 
where e is the population residuals. Tukey has suggested weighting 
the e^ by the corresponding inverse square root of the diagonal of 
(I - X(X^X)"''"X^) . This idea has not been followed up at this time. 
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1. Notation for Data 



Statistic 


Definition 


N 


number of observations 


m 


number of regressors 


1,1' = 1,2 N 


indices of observations 


= 0,1,2, ...,in 


indices of regressors 


y = {y^} 


Nth order vector of response values to 
be fitted 


X = {x^j} 


NX (m + 1) matrix of values of regressors. 
All elements x^q = 1 . The rank of X is 
m + 1 




NxN positive definite diagonal matrix of 
weights 



NOTE: All data elements are fixed, known finite, real numbers. 
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2. Ordinary Least Squares Definitions 



Statistic 



Description 



y - E^y^/N 



mean value of the y , 



- E(y -?)^/(N-l) 
y 1 



variance of the y. 



i - {bj} - (X'X)"^X'y 



©4-1 order , vector of 
regression coefficients 



y - {yj} - Xb 



Nth order vector of 
fitted values 



e - {e^} « y - Xb 



8 - /e'e/(N-m-l) 
e - - 



Nth order vector of 
residuals 

standard error of estimate 



squared multiple correlation 



E(y^-y^) /(N-m-l) 



p(F) 



test statistic for Bj^-Bj"* '"8^-0 



probability associated with F 



cov(b) - {cov(bjj,)} - 8^(X'X)"^ nrfl by m+l matrix^of variances 



SE 



(bj) - ^OV(bjj) 



and covariances of b 
standard error of b 



j 



tj - bj/SE(bj) 

P(tj) : 



test statistic for 8j"0 

probability associated vrlth t. 
(usually two tailed) 
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3. Case I: Ordinary Least Squares Definitions for Signed Permutation 



Statistic 



3j - ^*=1J> 



Description 



Nx(iiHhl) matrix of calculus 
or generalized Inverse of X' 

The j^'^ column of C 



NxN Idempotent matrix 



b » {b^} - C'y 



y = {y^} = Xb 



(m+1)^^ order vector of 
regression coefficients 

N^^ order vector of fitted values 



e = {e^} = y - y 



N order vector of residuals 



e = N~ E^e^ = 0 



Mean of residuals 



2-12 



Mean square or variance of 
residuals 



Constructed values of y for the 
k^^ signed permutation 



!k = tyki> = ^k. 



ave(b^) = b' = K Ej^b^ 



th 



Regression coefficients for the 
k^^ signed permutation 

Fitted values for the k 
signed permutation 

Average value of bj^ 



-1 



cov(b^^) = K'-^r^^bj^ - bXbfc - b)' 



Covar lance of b, 



-1, 



d^ = (bj^ - b)(cov(bj^)) "(bj^ - b) 



Squared distance of bj^ from b 
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4. Case I: Ordinary Least Squares 
Sunnnary of K « 3840 Regression Equations 



Regression Coefficient 




Squared Distiance 


Statistic 


Intercept 


Slope 




Statistic 


Distance 




b, 
ko 


kl 






d2 
k 


ave(b. ) 


2.0000 


3.0000 




ave(d?) 
k 


2.0000 


var(b^j) 


29.4800 


2.6800 




var(d^) 


1.8058 


skew(bj^j ) 


0 


0 






— 


kurt(bj^j) 


2.1500 


2.1828 






25.9328 




.368 


1.833 




F* 


12.966 




.35 


.03 




p(F*) 


= 0 


P(z^) 


.3792 


.0229 




p(F*) 


0 


NOTE: Zj = b 


^//var(bj^j) . 




from 


normal table. 


and P(Zj) 


Is compiled by dividing the number 


of b, . with 


different 


signs by K = 38A0 . 










d*^ is 


distance of b 


from origin 


, i.e. a^b'(C'C)"-^b , F* is 


d^/(m 


+ 1) , P(F*) 


is from 


F 


table with m + 1 and d.f . 


P(F*) 


is computed by dividing 


the 


number of d? 

k 


> d^ by . K . 
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5. Case II: Weighting the Signed and Permuted Residuals Definitions 



Statistic 



Description 



C = {c^j} = WX(X'WX)" 



Nx(ntfl) matrix of calculus 
or generalized inverse of X' 

The j^^ column of c 



Q = {q...} = C(C'C)""^C' 



NxN idempotent matrix 



b = {b^} = C'y 



y = {y^} = Xb 



(m+1)^'^ order vector of 
regression coefficients 

N^^ order vector of fitted values 



e = {e^} = y - y 



N^'^ order vector of residuals 



e = N L.e, 
1 1 



Mean residual 



2-1 - 2 

a = N ^I^(e^ - e)^ 



Variance of residuals 



2^-2 2 
l'2e = " + e - N E^e^ 



Mean square residual 



Constructed values of y for the 
k^^ signed permutation 



ave(bj^) = b' = K" Lj^l 



Regression coefficients for the 

th 



k^^ signed permutation 



Fitted values for the k 
signed permutation 

Average value of 



cov(bj^) = K"^Ej^(bj^ - b)(bj^ - b)' 



Covariance of b 



k . 



= (b^ - b)(cov(bj^))'^(bj^ - b) 



Squared distance of b^^ from b 
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6. Case II: Weighting the Resigned and Permuted Residuals 
Summary of K » 3840 Regression Equations 



Regression Coefficient Squared Distance 



Statistic 


Intercept 


Slope 


Statistic 


Distance 




b. 

ko 


b, , 
kl 




df 
k 


ave(b ) 


-3.3750 


4.1250 


ave(d?) 
k 


2.0000 


var(b^j) 


57.4901 


5.2080 


var(d^) 


1.6500 


skew(bj^j) 


— 






— 


. kurt(bj^j) 


2.0165 


2.0067 


d^ 


10.8958 


z . 
3 


.445 


1.808 


F* 


5.448 


P(2j) 


.33 


.03 


p(F*) 


S 0 


P(2j) 


.3547 


.0229 


P(F*) 


0 


NOTE: Zj - bj 


//var(b^j) . 


p(Zj) from normal table. 


and P(2j) 


is compiled by dividing the number of 


b, . with 


different 


signs by Kj^ = 38A0 


• 






d^ is 


distance of b 


from origin, i. 


e. y2gb'(C'C)~H , 


F* is 


d^/(in +1) , 


p(F*) is from 


F table with m + 1 


and " 


d.f. , p(F*) 


is computed by dividing the number of 


d2>d2 
k 


by . 
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7. Case III: Permuting the Weighted Residuals Definitions 



Statistic Description 





2 

NxN matrix such that V « W 


y* - {y*} - Vy 


Weighted values of y 


X* - {x*j } - vx 


Weighted values of X 


C = {c^j} - vx.(x'wx)"-^ 


Nx(m+1) matrix of calculus or 
generalized inverse of X' 


Q - {q^i.} " c(c'c)"^c' 


NxN idempotent matrix 


b = {bj} = C'Vy = C'y* 


(mfl)^*^ ofder vector of regression 
coefficients 


y = {yj} = Xb 


N^^ order vector of fitted values 


y* s {y*} . x*b " Vy 


Weighted values of y 


e = {e^} = y - y 


N order vector of residuals 


e* s {e*} » y* - y* " Ve 


Weighted values of residuals 




Mean of weighted residuals 




Mean square of weighted residuals 




Fitted values for the k^^ 
signed permutation 



i 
I 
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Y 



7. (Continued) 



Description 



Weighted values of y 



Statistic 

- {y^,} = y* + P^e* 

b^- {b^j}-C'Vyk=C'y^ • 

AAA 

A A A 

^k^ ^ki ^^^k 

ave(bj^) = b' = 

cov(bj^) = - ^)(\ " S)' 

^k " " b)(cov(y)"^(bj^ - b) 



Regression coefficients for the 
k^^ signed permutation 

Fitted values for the k^^ 
signed permutation 

Weighted values of 
Average value of bj^ 
Covariance of bj^ 

Squared distance of b- from b 
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8. Case III: Resigning and Permuting the Weighted Residuals 
Summary of K « 3840 Regression Equations 



Regression Coefficient Squared Distance 



Statistic 


Intercept 


Slope 


Statistic 


Distance 












ave(bj^j) 


-3.3750 


4.1250 


ave(d^) 


2.0000 


var(b^.) 


30.7258 


3.1219 


var(d^) 


1.7422 


skew(bj^j) 


0 


0 






kurt(b, .) 


2.1728 


2.1722 


d2 


36.2613 




.609 


2.33 


F* 


18 13 




.27 


.01 


P(F*) 


= 0 


viz.) 


.2862 


0 


P(F*) 


0 


NOTE: = b.// 


var(bj^j) , 


p(Zj) from 


normal table, 


and P(Zj) 


is compiled by dividing the number 


of bj^j with 


different 


signs by 


K = 3840 . 









d is distance of b from origin, i.e. b'(C'C)~ b . 

Ze 

2 

F* is d /(m +1) , p(F*) is from F table with m + 1 

and « d.f . , p(F*) is computed by dividing the number of 

2 2 
d^ > d by K . 
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9, Summary of Signed Permutations and Moment Notation 



Statistic Description 







Signed Permutations 


K = 




Number of possible signed permutations 


k = 




Index of signed permutations 


\- 




NXN signed permutation matrix.- Each row 
and column has exactly one nonzero element 
which may be either +1 or -1. 

Moment Notation 






The p^*^ (uncorrected) moment of variable 
Zj (j = 1,2,.. .,Z) 


^iz 




The (uncorrected) skewness of z 


^2z 




The (uncorrected) kurtosis of z 
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A. case I: Ordinary Least Squares 
Numerical Example 




-48- 



B. Case I: Ordinary Least Squares 
Signed Permutations 



Values of e, and y. 



•?2 


X2 


■-7" 




"-2" 


-8 




0 


-1 




10 


-2 




12 


-4 




.21. 



t3 


?:3 


^4 


5:4 


f5 


-8" 




-_3- 




"8- 




"13" 




-8" 


7 




15 




7 




15 




-4 


-1 




10 




2 




13 




1 


-2 




12 




-1 




13 




2 


4 




21 




-4 




-13 




7 



5:5 



r-3 
4 

12 

16 
24 



Signed Perautatlon Matrices 



-1 0 0 0 0 

0 10 0 0 

0 0 10 0 

0 0 0 1 0 

0 0 0 0 1. 



0 10 0 0 

1 0 '0 0 0 
0 0 10 0 
0 0 0 1 0 

LO 0 0 0 1. 



"0 -1 0 0 0" 

1 0 0 0 0 

0 0 0-10 

0 0 10 0 

.0 0 0 0 -1. 



0 10 0 0 

0 0 0 0 -1 

0 0-100 

0 0 0-10 

1 0 0 0 0 



Regression Coefficients 



[1:3 



[1:3 



["] 



The minimum bj^^ can be computed from the maximum by 
roin bj^j ^ j \j ^j^ example, the 

minimum intercept = 2 - (14 - 2) = -10 . 
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D. Case II: Weighting the Permuted Residuals 
Regression Analysis 



w 

-11 



"1" 

4 
9 
4 

LiJ 



[33.33] 
b 

r-3.3751 
L ^.125j 



c = wx(x'wx) 



.75 ■ 
4.875 
9.00 
13.125 
17.25 , 



11.25 
-4.875 

1.00 
-1.125 
3.75 J 



.4276 
.9605 
.4737 
.5395 
.3224 



•.125" 
-.250 
0 

.250 
.125. 



Signed Permutations" 



^2 


^2 


~^3 


h 


'-11.25" 




■-10.50" 




■-4.88" 




'-4 . 13" 


-4.88 




0.00 




11.25 




16.13 


1.00 




10.00 




1.00 




10.00 


-1.13 




12.00 




-1.13 




12.00 


_ 3.75. 




21.00. 




_ 3.75. 




-21.00. 





^4 


~ ^ 


^5 


" 1.13" 




~ 1.88" 




"-3.75" 




"-3. 00" 


11.25 




16.13 




-11.25 




-6.38 


3.75 




12.75 




1.00 




10 . 00 


-4.88 




8.25 




4.88 




18.00 


-1 . 00. 




16.25. 




_ I.IX 




J-8.38. 



^2 

-1 0 0 0 0" 

0 10 0 0 

0 0 10 0 

0 0 0 1 0 

L 0 0 0 0 1. 



52 
r-13.00l 

L 6.94J 



^3 

"0 1 0 0 0" 

1 0 0 0 0 

0 0 10 0 

0 0 0 1 0 

0 0 0 0 1. 



h 

[I'll] 



^0 0 0 -1 0" 

1 0 0 0 0 

0 0 0 0 1 

0 10 0 0 

.0 0 -1 0 0. 



^4 
ri2.65'| 

L-0.17J 



"0 0 0 0 -ll 
-1 0 0 0 0 

00100' 

0-1000 
.0 0 0 -1 oj 




NOTE: The data are y and X shown In Figure A. 
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E. Case III: Permuting the Weighted Residuals 
Regression Analysis 



»^2e* 
[49.95] 

A 

b 

.3: 

.125J 



r-3.'3751 

La..-- 



■ .750' 
9.750 
27.000 
26.250 
17.25CLI 



e* 

"11.250" 
-9.750 
3.000 
-2.250 
. 3.750 



C = VX(X'WX) 



-1 



.4276 
.4803 
.1579 
.2697 
.3224 



-.125" 
-.125 
0 

.125 
.125. 



Signed Permutations 
Values of e, and y. 



!2* 




t*3 


It 


tt 


u 


n 


n 


11.25" 




-10.50~ 




"-9.75" 




~-9.o(r 




9.75" 




10.50" 




"11.25" 




-io.5(r 


-9.75 




0.00 




11.25 




21.00 




11.25 




21.00 




-9.75 




0.00 


3.00 




30.00 




3.00 




30.00 




2.25 




29.25 




2.25 




29.25 


-2.25 




24.00 




-2.25 




24.00 




-3.00 




23.25 




3.75 




30.00 


3.75_ 




_21.00_ 




_ 3.75 




.21 . 00_ 




_-3.75_ 




13.50_ 




3.00 




20.-25_ 



--1 


0 


0 


0 


0" 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0. 


0 


1 


0 


0 


0 


0 


0 


1 














"0 


1 


0 


0 


0" 


1 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


_0 


0 


0 


0 


1_ 




"4 

"0-1 0 0 0" 

1 0 0 0 0 

0 0 0-10 

0 0 ~J 0 0 

.0 0 0 0 



L6.56J 







p 










5 






"1 


0 


0 


0 


0" 


0 


1 


0 


0 


0 


0 


0 


0 


-1 


0 


0 


0 


0 


0 


1 


_ 0 


0 


1 


0 


Q_ 




NOTE: The data y and X are shown in Figure A and the weights are shown 
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APPENDIX 



Lemmas 

For the following lemmas, a Is an arbitrary column vector with 

elements a. (1 « 1,2,...,N), a » Za^N, = ^A^^ a)^/N, = Za?/ 
1 1 a 1 1 za 1 

2-2 A 
a + a , and \x, = E.a./N. 
A Ha i 1 

For all N^^ order signed permutation matrices P^, k « 1,2 K 

N 

where K = 2 N!, there Is a signed permutation vector aj^ « Pj^a with 

elements a, ^ • 
kl 

The subscript 1' = 1,2,...,N but does not = 1. 
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Lemma 1: ^i^^.a^a^, - N(Ny2a ~ >^4a^ 
Proof: 

22 22^22^ . J J 

= a^a2 + a^aj +....+ aj^_^aj^ 

= ala/^ - al) + a^d^a^ - a^) + . . . + a^d^a^ - a^) 
2 2 4 
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Lemma 2: Given any set of elements a, ,a, ,...,a (i ?^ 1^ 9^. . i ) 

12 m 

which are raised to integer powers P2»P2'***'^m ^^sP^^^^'^^ly » 
least .one ( j = l,2,...,m) is odd, then 

Pi P2 Pm ^ 
12 m 



Proof: If any power is odd, then the summation over all k signed permu- 
tations will contain each combination of other a an equal number of 
times with positive and negative signs. 
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Lemma 3: Let be an element In a , then 

^' ^ h^VX = ^2a 
- ^a 

^- = <^ - ^>^'<^^2a - ^a> 

Proof : 

2 

a. Z a, . contains the square of each element a. of a exactly 

k Kl 1 ^ 

2^(N - 1)1 times, thus 

-1, 2 2^(N - 1)! ^ 2 ^i^i 
^ ^k^ki ^ . = — = ^2a • 

4 th 

b. contains the 4 power of each element of a exactly 

N 

2 (N - 1)! times» thus as 3a. 

2 2 ^ 

c. Z^aj^^aj^^i contains the product of squares of each distinct 

N 

pair of elements in a exactly 2 (N - 2) ! times, thus 
-1. 2 2 2^(N - 2)! ^ 2 2, 



6i 



Using Lepna 1 on the term in brackets. 
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Lemma 4: Given a column vector c with elements c^ (i = 1,2,.. .,N) 
with (uncorrected) moments y^^ = ^c^/N, y^^ " i^c^/N, and y^^ = D^c^/N, 
and p an odd positive integer, then 



a. rtj^(c'aj^)P - 0 



Proof : 

a. Each term of (c^^a^^^ + c^a^^^ + ... + '^N^kN^^ ^^^^ ^^"^^ ^^^^^^ 
one odd power, thus each term vanishes by Lemma 2. 

b. Using Lemma 3a, K"-^Ej^(c^a^^ + C2a^2 + ... + c^a^^ + odd powers) 

"'^^^L^ Via*---- V2a.= N^2a^2c • 

c. K ^i^(ca^)^ has terms ^j[^^j^> 3c^c^ la^^a^^^, , and odd powers. 

Using Lennnas 3b and 3c on the teinns in brackets. 



2 2 -1 2 2 
+ Sc^c^ [K \\i\2^ •••• powers 



^a^i^^i^T^Ny^^-^a^Vi'Vi- 
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Lemma 5. For the vectors a^, 



where I is an N^*^ order identity matrix. 
Proof: 

a* Each element of a is contained in the sum an equal number 

of times with positive and negative signs. 

-1 -1 2 

b. The diagonal elements of K Zj^aj^a^ are K ^j^^j^^^ " ^2a 

2 

and the off diagonals are odd powered. Also, = + ^ 



64 
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Lemma 6: Given an N x N matrix Q of rank r ^ N then 
and If Q Is Idempotent (i.e. QQ - Q) then 
where I is an Nth order Identity matrix. 

Proof: Since each off-diagonal element of Q, ^l^^i* say, is matched vit 

-q I in all off-diagonal summations, then K""^j:, P OP ' is at least ciaff 
11 k k k 

Each diagonal element of K"^2:^P^QP^ consists of the sum of the diagonal 
element of Q exactly 2^(N-1)! times » thus k"^E P QP' « M"'^Tr(Q)i: If 

K. K K. 

QQ - Q, then Tr(Q) - r and k"^ ^^^Q^k " 
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Theorem 1 (Case I: Ordinary Least Squares) 

Given the definitions in Tables 1, 3, and 9, then 

a. ave(b^) b 

b. cov(b^) = y2e^'^" a^(X»X)""^ 



c. skew(b, .) " 0 



^2e^2c, ^2c, 



d. kurt(b, .) = 



kj^ N N - 1 ^ N N 



2 

e. ave(d^) = m + 1 



9 1"S> o 

f. var(d^) = ( -jj— ^ )(m + 1)^ + 2( "jj— ) (m + D 



2 ,!2el^l^ 3N , 
i^ii^ N - 1 N - 1 ^ 



g. ave(y^) = y 

h. cov(y^) « y2e^^'^' " a^X(X'x)'""^x' 



Note that all moments are central since e = 0 and ^^^^j ~ 0 
for all j except j = 0, the intercept. 



Proof: Note that \ " b = ^'^k? ^k ^ ^^'^k^ * 
a. Using Lemma 5a on the term in brackets. 



ave(b^) = K""^b^ = ^'^k?^ 



b + C' [K = b 



66 



b. Using Lemma 5b and the fact that b = b , 

cov(y = ^~\^^k-^>(k ~ 

- K~4j^(C'Pj^e)(C'Fj^e)' 

= C'[rtl^Pj^e'P^]C = C'(y2eI)C' 

since = and C'C = (X'X)""^, then 
2e 

2 -1 
cov(fej^) = a (X'X) . 

c. Noting that is the j*^^ colsimn of C and using Lemma 4a, 

skew(bj^j) = = (K"^\(bk-j - ■ ^V^^^ 

= [rtj^(CjP^)^]^/(K-^(Zj^CjPj^e)^^ 

= 0 
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d. Using Lemmas Ab and Ac, 



Using Lemma 6b, 



N N - 1 N N • 



ave(d^) = k"\(\ - b)'(cov(bj^)) ^(b^^ - b) 



K"-^Si^(C • Pj^e) • {.M^C • C) (C • Pj^e) 



e'e 

-l./m + l-v m + 1 ^ ^ 

y 2e 



m + 1 
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f . Since - U-ie-P^QP^e. 



var(d^) - K'^Zj^d^- (ave(d^))^ 



After squaring the term in the first parentheses and 
rearrangement » 



- (m + 1)^ 



By Lemma 3b, 



-2^ 2 rv-lv »^ 1 ^4e _ 2 « r. 2 
^2e^i'liif^ ^k\i^ = — ^i'lii - ^2e^i'lii 

"2e 
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c 

By Lemma 3c and the fact that ^^^^iQ^^i - (m + 1) - ^^<i^^^, 

2 2 2 

M^^(N - 1) <N»^2e - ^e>^l^l''lll' 

By Lemma 3c and the fact that ^i^i»*Ji^i^qj^i j^i "=(01 + 1)^ - 2^q^^ , 

1 2 

(N - 1) Ae 1 1 11 1 1 

= N^<N- ^2e><<"'-^l>'- ^l4> • 
By Lemma 2 all odd powers vanish, thus 

var(d2) = e^.I.q',, + (N - e2e><"' ^ " ^i^ii> 



70 



-66- 



Rearrange in terms in powers of m + 1, 



1-6 2(N - 3 ) 
var(d^) - — fj-TT + 1> + irH^ 



g. Using Lennna 5a, 



ave(yj^) - K'^^^^y^^ = K~\(y + XCP^^e) 



o y + XC'CK'-'-Z^Pj^e] 



h. Using Lemma 5b, 



cov(yj^) = K'-'^i^Cyj^ - y) (y^^ - y) ' 



rtj^(XC'Pj^e)(XC'Pj^e)' 



XC'[K~'''Ej^Pj^ee'P^]CX' 



= vi2eXC'CX' 
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and thus 



cov(y^) « o^X(X'X)~^X' 
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Corollary 1: 

' Given a subset b of b, the corresponding subset b, of b , and 
-S ^ ks ^k 

the subset cov(bj^p of cov(y , d^^ = (b^^^ - b^) ' (Cov(bj^^))"\b^j^ - b^) , 
then the 



ave(d2^) = - ^ ' (cov(b\^))-^b\^ - b^) = 



and 



m 

s 



0 6,.(N + 2) 
+ E q2 ( _2e 



3N 



i^sii^ N - 1 N - 1 



where m is the number of elements in b and the q are defined 
s sii 

below. 



Proof: 

Let X be partitioned into (X ,X-) where X in the N x m matrix 

s s s s 

consisting of the columns of X corresponding to b and X- is a N x m- 

*. s s s 

(m~ = m + 1 - m ) matrix containing the remaining columns. Also, let 
s s 



X = X - x-(x-ix-)"-'-x-ixs 

s s s s s s 
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and 



s 8 s s; 



These values may be substituted for X and y In Table 4 without affecting 
the values of y and e. Thus, C becomes 



S S 8 8 



and 



C'y = b 

sis ^s 



with these substitutions in Table A, Theorem le and If follow. The 
matrix Q becomes 

i 

Q = (C'C )"-'-C' = X (X'X )"-'-X' 

S |SSS s sss s 



where Q has elemints ,q . . y (l»l' = 1,2,...,N). The value of m + 1 

becomes m since m is the rank of Q 

s s s 



Note: In most statistical analyses the variance of the Intercept b 

o 

is not of Interest; thus X and y are X and y centered about their 

S S m, 

2 

respective means and the ave dj^ = m . 
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Theorem 2: (Case II: Permuting the Weighted Residuals) 
Given the definitions In Tables 1, 5, and 9, then 

a. ave(b, ) = b 

b. cov(b ) « c'c = (a^ + i^)wx(x'wx)""-'-x'w^x(x'wx)"-'-x'w 

-^e B 

c. skew(bj^^) = 0 

d. kurt(b^^) = ^ <i - -r - V > 



2 

e. ave(dj^) « m + 1 



f. var(d2) = ( -j^ )(m + 1)^ + 2( — ^ )(m + D 



1 -&,p 2 ^ " ^2e 
^'n + 1)^ + 2' ^ 

2 ,he^^_l^ 3N . 
^i'^ii^ N - 1 ~ N - 1 ^ 



g. ave(yj^) = y 

h. cov(y, ) = y-, xc'cx' = (a^ + i^)x(x'wx)"-^x'w^x(x'wx)~-^x' 



Note that the moments are not in general central except for 
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Proof: With the substitution of definitions from Table 5 for Table 4, 
the proof follows the same steps as Theorem 1 except that 

^2e = ^e -^-^ 

and 

C'C = WX(X'WX)'^X'W^X(X'WX)'^X'W . 



7 
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Theorem 3: (Case III: Permuting the Weighted Residuals) 
Given the definitions In Table 1, 7, and 9, then 
> a. ave(b, ) « b 

e'We , 
b. cov(bj^) « ^26^'^ " ^ "ir^ XX'WX)"-^ 



c. skew(b, ) = 0 



d • kur t 



(b ) «= 1 + — =^ (1 — )Cl 

^ kj^ N N - 1 N N 



2 

e. ave(dj^) ~ m + 1 



2 ^ - ? ^ " ^2e 

f. var(dp = ( -^-rj X"' + 1) + 2( -^^rr + 1) 



^i^^ii^ N - 1 - N - 1 ^ 



g. ave(yj^) = y 



e'we , 
h. cov(9j^) = y2gXC'CX' = ( )x(x'wx)"-^x' 



Note that the moments are not, in general, central. 
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Proof: With the substitution of i definitions from Table 6 for Table 4 
and substitution of y*. X*. y*. e*. i*. u^^*^ y^ ^nd y* for their 
unstarred equivalents, then the proof follows the same steps as Theorem 1 
except that 

e'we 
^2e* ° " 

and 



c'c = (X'WX) 
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